用于预测HIV-1整合酶链切割(3’processing)抑制剂的支持向量机模型

Support vector machine (SVM) models for predicting inhibitors of the 3’ processing step of HIV-1 integrase

Xuan, S.Y.; Wang, M.L.; Kang H.; Kirchmair, J.; Tan, L.; Yan, A.X.*
Molecular Informatics, 2013, 32(9-10), 811-826.

    抑制HIV-1整合酶的链切割过程(3'P)是艾滋病治疗中最有前途的策略之一。 使用支持向量机(SVM) 算法,我们构建了6个分类模型来预测3'P抑制剂生物活性。这些模型基于1253个抑制剂分子数据集和经过筛选的48个分子描述符构建, 实验报道的IC50活性值范围从纳摩尔级到微摩尔级。SVM模型Model B2表现最好,其对测试集的预测精度、敏感性、特异性和Matthews相关系数(MCC) 分别为93%、81%、94%和0.67。 氢键形成能力和亲水性的存在通常是影响抑制剂生物活性的关键因素。其他重要因素包括分子折射性、π原子电荷、总电荷、孤对电负性和有效原子极化性。 通过对高活性抑制剂和弱活性抑制剂的结构比较分析证实了以上观察结果,并揭示了3'P抑制剂的几个特征结构元素。

阅读文章原文

下载原始数据

Download Supporting Information

    Inhibition of the 3’ processing step of HIV-1 integrase by small molecule inhibitors is one of the most promising strategies for the treatment of AIDS. Using a support vector machine (SVM) approach, we developed six classification models for predicting 3’P inhibitors. The models are based on up to 48 selected molecular descriptors and a comprehensive data set of 1253 molecules, with measured activities ranging from nanomolar to micromolar IC50 values. Model B2, the most robust SVM model, obtains a prediction accuracy, sensitivity, specificity and Matthews correlation coefficient (MCC) of 93 %, 81 %, 94 % and 0.67 on the test set, respectively. The presence of hydrogen bonding features and hydrophilicity in general were identified as key determinants of inhibitory activity. Further important properties include molecular refractivity, π atom charge, total charge, lone pair electronegativity, and effective atom polarizability. Comparative fragment-based analysis of the active and inactive molecules corroborated these observations and revealed several characteristic structural elements of 3’P inhibitors. The models built in this study can be obtained from the authors.

Read More

Classification Models performance:   Dataset (1253 3’P inhibitors of HIV-1 Integrase)

Model Name Algorithm Descriptors Spliting methods Training set numbers Training set accuracy (%) Training set Cross-validation 5-fold accuracy (%) Training set Cross-validation 10-fold accuracy (%) Training set Cross-validation LOO accuracy (%) Test set numbers Test set SE Test set SP Test set accuracy (%) Test set MCC
Model A1 SVM 41 MOE Random 493 95.94 82.76 82.56 83.57 760 69.33 88.91 86.97 0.4641
Model A2 SVM 41 MOE Kohonen’s self-organizing map (SOM) 537 92.92 79.70 79.33 79.70 716 68.83 93.58 90.92 0.5726
Model B1 SVM 41 MOE + 7 RDF Random 493 99.39 84.38 83.37 85.40 760 69.33 90.07 88.03 0.4859
Model B2 SVM 41 MOE + 7 RDF Kohonen’s self-organizing map (SOM) 537 98.32 79.70 79.89 81.56 716 80.52 94.21 92.74 0.6707
Model C1 SVM MACCS Random 493 96.35 81.74 83.77 84.18 760 38.51 96.32 86.05 0.4465
Model C2 SVM MACCS Kohonen’s self-organizing map (SOM) 537 92.92 81.01 81.38 80.45 716 51.92 96.24 89.80 0.5478

主要项目成员

宣首逸

博士研究生

王茂林

博士研究生